Add similarity search functions for tasks and tickets by dolliecoder · Pull Request #166 · AOSSIE-Org/Ell-ena

dolliecoder · 2026-02-13T14:47:55Z

This PR introduces SQL helper functions to enable vector similarity search for tasks and tickets using the pgvector setup added in PR1.

It is a follow-up, incremental step toward Issue #65. While PR1 introduced the description_embedding vector(768) columns for tasks and tickets (storage layer), this PR builds on that foundation by adding database-level similarity search functions (retrieval layer).
No embedding generation, indexing, or AI service integration is included here. This PR strictly enables semantic retrieval capability at the database level.

Dependency Note:
This PR depends on PR1, as it relies on the description_embedding columns introduced there. PR1 must be merged before this PR to ensure the functions execute against an existing schema. pr1 : #160

Changes Made

Added get_similar_tasks(query_embedding, match_count) SQL function
Added get_similar_tickets(query_embedding, match_count) SQL function

Each function:
Computes cosine similarity using <=>
Returns top-k most semantically similar rows
Ignores rows without embeddings (IS NOT NULL)
Added new Supabase migration file to maintain proper migration ordering

✅ Checklist

I have read the contributing guidelines.

I have added tests that prove my fix is effective or that my feature works.
(Not applicable – database-level capability addition only.)

I have added necessary documentation (if applicable).
(Not required at this stage.)

Any dependent changes have been merged and published in downstream modules.
(Depends on PR1 – embedding schema changes.)

coderabbitai · 2026-02-13T14:48:12Z

📝 Walkthrough

Walkthrough

This pull request adds two PostgreSQL PL/pgSQL functions for semantic similarity search. The get_similar_tasks and get_similar_tickets functions compute cosine similarity between query embeddings and stored embeddings, returning ranked results up to a configurable limit.

Changes

Cohort / File(s)	Summary
Database Vector Search Functions `supabase/migrations/20251021110000_task_ticket_vector_search.sql`	Adds two new PL/pgSQL functions (`get_similar_tasks` and `get_similar_tickets`) that perform semantic similarity search using vector embeddings. Both functions accept a query embedding (768-dimensional vector) and match count parameter, computing cosine similarity via the `<=>` operator and returning ranked results with similarity scores.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 Embeddings dance in vectors bright,
Eight seventy-four dimensions take flight,
Cosine whispers "find what's near,"
Similar tasks and tickets appear!
Search with similarity, clear and light. ✨

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Add similarity search functions for tasks and tickets' directly and clearly summarizes the main change—adding two SQL functions for vector-based similarity search on tasks and tickets.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Merge Conflict Detection	✅ Passed	✅ No merge conflicts detected when merging into `main`
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@supabase/migrations/20251021110000_task_ticket_vector_search.sql`:
- Around line 17-20: Add HNSW vector indexes so the <=> similarity ORDER BY on
description_embedding uses an index: create IF NOT EXISTS idx_tasks_embedding on
tasks using hnsw for column description_embedding with vector_cosine_ops, and
create IF NOT EXISTS idx_tickets_embedding on tickets similarly for
description_embedding; ensure the new indexes are applied before running
similarity queries (also add the same index when you see other queries ordering
by description_embedding <=> query_embedding).

🧹 Nitpick comments (1)

supabase/migrations/20251021110000_task_ticket_vector_search.sql (1)
1-23: Consider LANGUAGE sql and marking as STABLE.

Since the function body is a single RETURN QUERY SELECT, PL/pgSQL is unnecessary overhead — plain LANGUAGE sql avoids the PL/pgSQL executor layer. Additionally, these functions have no side effects and return consistent results for the same inputs within a transaction, so marking them STABLE lets the planner optimize repeated calls.
♻️ Suggested diff
 CREATE OR REPLACE FUNCTION get_similar_tasks(
     query_embedding vector(768),
     match_count INT DEFAULT 3
 )
 RETURNS TABLE (
     task_id UUID,
     title TEXT,
     description TEXT,
     similarity FLOAT
-) AS $$
-BEGIN
-    RETURN QUERY
+) LANGUAGE sql STABLE AS $$
     SELECT
         t.id AS task_id,
         t.title,
         t.description,
         1 - (t.description_embedding <=> query_embedding) AS similarity
     FROM tasks t
     WHERE t.description_embedding IS NOT NULL
     ORDER BY t.description_embedding <=> query_embedding
     LIMIT match_count;
-END;
-$$ LANGUAGE plpgsql;
+$$;
Same applies to get_similar_tickets below.

coderabbitai · 2026-02-13T14:49:48Z

supabase/migrations/20251021110000_task_ticket_vector_search.sql

+        1 - (t.description_embedding <=> query_embedding) AS similarity
+    FROM tasks t
+    WHERE t.description_embedding IS NOT NULL
+    ORDER BY t.description_embedding <=> query_embedding


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find the migration file find . -name "*20251021110000_task_ticket_vector_search.sql" -type f

Repository: AOSSIE-Org/Ell-ena

Length of output: 128

🏁 Script executed:

#!/bin/bash # Check all SQL files for vector index creation patterns rg -n 'hnsw|ivfflat|vector_cosine_ops|vector_l2_ops' --type sql

Repository: AOSSIE-Org/Ell-ena

Length of output: 44

🏁 Script executed:

#!/bin/bash # Also search for CREATE INDEX with vector patterns rg -n 'CREATE INDEX.*embedding|CREATE INDEX.*vector' --type sql -i

Repository: AOSSIE-Org/Ell-ena

Length of output: 44

🏁 Script executed:

#!/bin/bash # List all migration files to understand the structure find . -path "*/migrations/*.sql" -type f | sort

Repository: AOSSIE-Org/Ell-ena

Length of output: 743

🏁 Script executed:

#!/bin/bash # Read the migration file to see its full content cat -n ./supabase/migrations/20251021110000_task_ticket_vector_search.sql

Repository: AOSSIE-Org/Ell-ena

Length of output: 1518

🏁 Script executed:

#!/bin/bash # Also check the meeting vector search migration for comparison cat -n ./supabase/migrations/20251021090000_meeting_vector_search.sql

Repository: AOSSIE-Org/Ell-ena

Length of output: 4913

Add vector indexes for efficient similarity queries.

The <=> operator in both ORDER BY clauses will fall back to sequential scans without indexes. Add HNSW indexes to enable efficient vector search:

CREATE INDEX IF NOT EXISTS idx_tasks_embedding ON tasks USING hnsw (description_embedding vector_cosine_ops); CREATE INDEX IF NOT EXISTS idx_tickets_embedding ON tickets USING hnsw (description_embedding vector_cosine_ops);

This is critical once the tables grow beyond a few thousand rows. If PR1 creates these indexes separately, this reminder can be disregarded.

Also applies to: 42-45

🤖 Prompt for AI Agents

In `@supabase/migrations/20251021110000_task_ticket_vector_search.sql` around lines 17 - 20, Add HNSW vector indexes so the <=> similarity ORDER BY on description_embedding uses an index: create IF NOT EXISTS idx_tasks_embedding on tasks using hnsw for column description_embedding with vector_cosine_ops, and create IF NOT EXISTS idx_tickets_embedding on tickets similarly for description_embedding; ensure the new indexes are applied before running similarity queries (also add the same index when you see other queries ordering by description_embedding <=> query_embedding).

I left indexing for now on purpose coz I wanted to keep this one focused on adding the similarity functions for tasks and tickets and not expanding the scope too much but I can add another pr for indexing as the table grows

@dolliecoder, that makes sense! Keeping the PR focused is a good approach.

Would you like me to create a follow-up issue to track adding the HNSW indexes for tasks.description_embedding and tickets.description_embedding? This way it won't be forgotten as the tables grow.

dolliecoder · 2026-02-15T18:01:16Z

Just checking in on this PR I will be happy to revise based on reviews. Please let me know if there’s anything I can adjust to help in moving this forward.

Add similarity search functions for tasks and tickets

7dc3182

coderabbitai bot reviewed Feb 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add similarity search functions for tasks and tickets#166

Add similarity search functions for tasks and tickets#166
dolliecoder wants to merge 1 commit intoAOSSIE-Org:mainfrom
dolliecoder:feat/task-ticket-vector-search

dolliecoder commented Feb 13, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 13, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 13, 2026 •

edited

Loading

Uh oh!

dolliecoder Feb 13, 2026

Uh oh!

coderabbitai bot Feb 13, 2026

Uh oh!

dolliecoder commented Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dolliecoder commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dolliecoder Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

dolliecoder commented Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dolliecoder commented Feb 13, 2026 •

edited

Loading

coderabbitai bot commented Feb 13, 2026 •

edited

Loading

coderabbitai bot Feb 13, 2026 •

edited

Loading